AIDA: Identifying Code Switching in Informal Arabic Text

نویسندگان

  • Heba Elfardy
  • Mohamed Al-Badrashiny
  • Mona Diab
چکیده

In this paper, we present the latest version of our system for identifying linguistic code switching in Arabic text. The system relies on Language Models and a tool for morphological analysis and disambiguation for Arabic to identify the class of each word in a given sentence. We evaluate the performance of our system on the test datasets of the shared task at the EMNLP workshop on Computational Approaches to Code Switching (Solorio et al., 2014). The system yields an average token-level Fβ=1 score of 93.6%, 77.7% and 80.1%, on the first, second, and surprise-genre test-sets, respectively, and a tweet-level Fβ=1 score of 4.4%, 36% and 27.7%, on the same test-sets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Token Level Identification of Linguistic Code Switching

Typically native speakers of Arabic mix dialectal Arabic and Modern Standard Arabic in the same utterance. This phenomenon is known as linguistic code switching (LCS). It is a very challenging task to identify these LCS points in written text where we don’t have an accompanying speech signal. In this paper, we address automatic identification of LCS points in Arabic social media text by identif...

متن کامل

Translation of Power and Solidarity Pronouns in Qur’anic Rhetoric

  Translation of the Holy Quran can be difficult for translators in terms of accuracy and translatability. Sometimes translators fail to render the Quranic thoughts because of the lack of language features in target languages. This results in an unfavorable interpretation. One of the challenging aspects of translating Quran is reference switching as rhetorical devices, which are widespread i...

متن کامل

Mixed Language and Code-Switching in the Canadian Hansard

While there has been lots of interest in code-switching in informal text such as tweets and online content, we ask whether code-switching occurs in the proceedings of multilingual institutions. We focus on the Canadian Hansard, and automatically detect mixed language segments based on simple corpus-based rules and an existing word-level language tagger. Manual evaluation shows that the performa...

متن کامل

Addressing Code-Switching in French/Algerian Arabic Speech

This study focuses on code-switching (CS) in French/Algerian Arabic bilingual communities and investigates how speech technologies, such as automatic data partitioning, language identification and automatic speech recognition (ASR) can serve to analyze and classify this type of bilingual speech. A preliminary study carried out using a corpus of Maghrebian broadcast data revealed a relatively hi...

متن کامل

High capacity steganography tool for Arabic text using 'Kashida'

Steganography is the ability to hide secret information in a cover-media such as sound, pictures and text. A new approach is proposed to hide a secret into Arabic text cover media using "Kashida", an Arabic extension character. The proposed approach is an attempt to maximize the use of "Kashida" to hide more information in Arabic text cover-media. To approach this, some algorithms have been des...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014